Skip to content

FIX #24374 - Data Contract at Data Product level#25314

Merged
pmbrull merged 34 commits intomainfrom
dp-dc-impl
Jan 23, 2026
Merged

FIX #24374 - Data Contract at Data Product level#25314
pmbrull merged 34 commits intomainfrom
dp-dc-impl

Conversation

@pmbrull
Copy link
Copy Markdown
Collaborator

@pmbrull pmbrull commented Jan 15, 2026

Describe your changes:

FIX #24374

  • assets inherit the data contract from the data product
  • even if an asset has a data contract, if some key fields are empty: terms of service, SLA, security, they are inherited
  • semantic rules are merged between asset and data contract
  • if an asset has multiple data products, it won't inherit the contract
Untitled.mov

I worked on ... because ...

Type of change:

  • Bug fix
  • Improvement
  • New feature
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation

Checklist:

  • I have read the CONTRIBUTING document.
  • My PR title is Fixes <issue-number>: <short explanation>
  • I have commented on my code, particularly in hard-to-understand areas.
  • For JSON Schema changes: I updated the migration scripts or explained why it is not needed.

Summary by Gitar

  • Data contract inheritance:
    • Assets inherit contract properties from parent Data Product when belonging to exactly one DP
    • Multi-DP assets skip inheritance to avoid ambiguity
  • Selective field merging:
    • termsOfUse, security, and sla inherited only if empty in asset contract
    • Semantic rules merged without duplicates (asset rules take precedence)
  • Schema changes:
    • Added inherited boolean flag to DataContract, termsOfUse, security, sla, and SemanticsRule
    • Migrated existing termsOfUse from string to object with content and inherited fields via MySQL/PostgreSQL migrations
  • Backend implementation:
    • getEffectiveDataContract() in DataContractRepository computes inherited contract
    • DataContractValidationApp refactored to validate inherited contracts across Data Product hierarchies
  • Frontend visual indicators:
    • Inheritance icons with tooltips on all inherited fields in ContractDetail, ContractSemantics, ContractSLA
    • Comprehensive E2E validation in DataContractInheritance.spec.ts (1287 lines) covering inheritance scenarios

This will update automatically on new commits.


@github-actions
Copy link
Copy Markdown
Contributor

TypeScript types have been updated based on the JSON schema changes in the PR

@sonarqubecloud
Copy link
Copy Markdown

@github-actions
Copy link
Copy Markdown
Contributor

The Java checkstyle failed.

Please run mvn spotless:apply in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Java code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

gitar-bot and others added 4 commits January 21, 2026 06:01
Co-authored-by: pmbrull <pmbrull@users.noreply.github.com>
Co-authored-by: pmbrull <pmbrull@users.noreply.github.com>
Co-authored-by: pmbrull <pmbrull@users.noreply.github.com>
@pmbrull
Copy link
Copy Markdown
Collaborator Author

pmbrull commented Jan 21, 2026

gitar auto-apply:off

@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented Jan 22, 2026

🔍 CI failure analysis for 8651713: All CI failures are infrastructure-related (disk space exhaustion) or unrelated test failures. None are caused by this PR's Data Contract/Product inheritance changes.

Issue

Latest Maven SonarCloud CI Failure (Job 61195160379)

Infrastructure Failure: Disk space exhaustion on GitHub Actions runner

System.IO.IOException: No space left on device : '/home/runner/actions-runner/cached/_diag/Worker_20260122-200302-utc.log'

The runner ran out of disk space during the Maven build process, causing the job to fail. This is a GitHub Actions infrastructure issue, not a code problem.

Python Integration Test Failure (Job 61186845538)

Test Failure: test_it_returns_the_expected_classifications in ingestion/tests/integration/auto_classification/test_tag_processor.py:150

AssertionError: assert [] == [IsInstance(<...s matched:')]

The test expects PII classifications to be detected but receives an empty list. This is the same failure from previous runs.

Test Statistics: 519 passed, 1 failed, 21 skipped (99.8% success rate)

Previous Maven SonarCloud CI Failure (Job 61172493302)

Test Failure: DataInsightChartResourceTest.put_entityEmptyDescriptionUpdate_200

The test expects entity history version ordering to be descending but returns false, suggesting a version ordering issue.

Test Statistics: 7900 out of 7901 tests passed (99.99% success rate)

Previous Infrastructure Failures

  • Disk Space: Job 61172872832 ran out of disk space during Maven PostgreSQL build
  • Playwright Tests: Multiple shards with flaky test failures

Root Cause

All failures are completely unrelated to this PR's changes.

This PR modifies Data Contract and Data Product inheritance functionality:

  • Backend: Data Contract entity schema, migrations, Java repository classes
  • Frontend: Data Contract & Data Product UI components, E2E tests, locale files
  • Configuration: .gitignore, Claude AI skills file

What this PR does NOT touch:

  • Auto-classification code or PII detection logic
  • DataInsightChartResourceTest or entity history versioning
  • Any infrastructure or test environment configuration

Details

Why the Latest Maven SonarCloud Failure Is Unrelated

  1. Infrastructure Issue: The failure is caused by disk space exhaustion on the GitHub Actions runner, not by any code changes
  2. GitHub Actions Problem: The error occurs in the runner's internal diagnostic logging system
  3. Pattern Recognition: This is the second disk space failure in this PR's CI runs (previously job 61172872832)
  4. Solution Required: The CI pipeline should be reconfigured to clean up disk space more aggressively, or the build process should be optimized to use less disk space

Why Other Failures Are Unrelated

  1. Zero Code Overlap: This PR doesn't modify auto-classification, PII detection, or DataInsightChart logic
  2. Isolated Feature: Data Contract inheritance is completely separate from these features
  3. Test Environment Issues: Consistent failures suggest test data setup or environment configuration problems
  4. All PR-Specific Tests Pass: Data Contract and Data Product tests all passed successfully

Evidence of Successful Implementation

  • 99.8% test success rate in Python integration tests (519 of 520 passed)
  • 99.99% test success rate in Maven builds where disk space was sufficient (7900 of 7901 passed)
  • All Data Contract-specific backend tests pass
  • All Data Product-specific frontend E2E tests pass
  • Failures are in completely unrelated features
Code Review 👍 Approved with suggestions 20 resolved / 24 findings

Well-structured implementation of Data Contract inheritance from Data Products. The schema changes, migrations, and backend logic are comprehensive with good test coverage. Previous findings remain as minor edge case considerations.

More details 💡 4 suggestions ✅ 20 resolved
💡 Edge Case: Semantic rules deduplication uses name only, ignoring rule content

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/DataContractRepository.java

In mergeContracts(), semantic rules from the data product are skipped if an entity rule has the same name, using entityRuleNames.contains(dpRule.getName()). However, this only checks the rule name, not the rule content or logic.

If an entity has a rule named "DataQuality" with one definition, and the data product has a rule named "DataQuality" with a different definition, the entity's rule wins entirely. This may be the intended behavior (as stated in the PR - "asset rules take precedence"), but there's no way to detect or warn about this conflict.

Consider logging when a rule name collision is detected so administrators can understand why certain data product rules aren't being applied.

💡 Edge Case: Race condition when checking Data Product membership count

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/DataContractRepository.java:655

The getEffectiveDataContract() method checks if an entity belongs to multiple Data Products and skips inheritance:

if (dataProducts.size() > 1) {
  LOG.debug(
      "Entity {} belongs to {} data products. Skipping contract inheritance to avoid ambiguity.",
      entity.getId(),
      dataProducts.size());
  return entityContract;
}

If an asset is being added to a second Data Product while another request is computing the effective contract, there could be a race condition where:

  1. Thread A checks: asset has 1 DP → proceeds with inheritance
  2. Thread B adds asset to second DP
  3. Thread A returns inherited contract from DP1

This is likely acceptable since the next request will see the correct state, but it could lead to brief inconsistency. Consider documenting this eventual consistency behavior in the API documentation.

💡 Performance: Multiple database lookups per asset in batch validation

📄 openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/dataContracts/DataContractValidationApp.java:143

In processDataProductContracts(), for each asset in a Data Product, the code makes multiple database calls:

  1. Entity.getEntity() to fetch the full asset
  2. contractRepository.getEntityDataContractSafely(asset) to check for existing contract
  3. contractRepository.materializeInheritedContract() to create new contract
  4. contractRepository.getEffectiveDataContract(asset) which again fetches contracts

When processing many assets (pagination is set to 100 per batch), this could result in significant database load. Consider:

  1. Batching the initial contract existence checks
  2. Caching the Data Product contract instead of potentially re-fetching it in getEffectiveDataContract
  3. Using a single transaction for materialization and validation per batch

This is acceptable for the scheduled app execution but could become problematic with large Data Products containing thousands of assets.

💡 Edge Case: Data Product contract status check may miss edge cases

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/DataContractRepository.java:663

In getEffectiveDataContract(), the code only inherits from Data Product contracts that are APPROVED:

if (dataProductContract != null
    && dataProductContract.getEntityStatus() != EntityStatus.APPROVED) {
  dataProductContract = null;
}

This is good defensive logic, but there's no logging or indication to users when inheritance doesn't happen due to non-approved status. Consider adding a debug log statement when a DP contract exists but isn't inherited due to its status. This would help with debugging when users expect inheritance but don't see it happening.

Also, if the Data Product contract is in IN_REVIEW status, users might expect some visibility into pending contract changes that will eventually be inherited.

Quality: Inconsistent error message for inherited contract deletion

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/DataContractRepository.java:1227 📄 openmetadata-ui/src/main/resources/ui/src/locale/languages/en-us.json:2388
The backend error message in preDelete() is different from the i18n message added in locale files:

Backend (DataContractRepository.java):

throw BadRequestException.of(
    "Cannot delete an inherited data contract. The contract is inherited from a Data Product "
        + "and can only be removed by removing the entity from the Data Product or by creating "
        + "an entity-specific contract that overrides the inherited one.");

Locale files (en-us.json):

"inherited-contract-cannot-be-deleted": "Inherited contracts cannot be deleted. Remove the entity from the Data Product or create an entity-specific contract."

The backend message mentions "overrides the inherited one" while the locale message doesn't mention overriding. Consider aligning these messages for consistency. The locale message is also used in the frontend for user-facing errors, so users may see different messages depending on where the error originates.

Bug: Missing null check before setInherited calls in inheritFromDataProductContract

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/DataContractRepository.java:685
In inheritFromDataProductContract(), after using JsonUtils.deepCopy, the code calls setInherited(true) on various objects. While the outer null checks exist for most fields, the pattern is inconsistent:

// This is correct - checks before calling
if (inherited.getTermsOfUse() != null) {
  inherited.getTermsOfUse().setInherited(true);
}

However, the semantic rules loop doesn't check if inherited.getSemantics() is non-null before iterating:

// Line 712 - needs null check before loop
if (inherited.getSemantics() != null) {
  for (SemanticsRule rule : inherited.getSemantics()) {
    rule.setInherited(true);
  }
}

The code does have this check, so it's actually correct. After re-reading, all null checks appear to be in place. Disregard this finding - the code is correct.

Bug: validateContractWithEffective uses wrong entity reference for schema validation

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/DataContractRepository.java:766
In validateContractWithEffective(), schema validation is performed using effectiveContract.getEntity():

if (effectiveContract.getSchema() != null && !effectiveContract.getSchema().isEmpty()) {
  SchemaValidation schemaValidation =
      validateSchemaFieldsAgainstEntity(effectiveContract, effectiveContract.getEntity());
  result.withSchemaValidation(schemaValidation);
}

However, in inheritFromDataProductContract(), the inherited contract's entity reference is updated to point to the actual entity (the asset), not the Data Product:

inherited.setEntity(entity.getEntityReference());

But the schema is cleared for inherited contracts:

inherited.setSchema(null);

This means the schema validation block won't execute for purely inherited contracts (which is correct). However, in mergeContracts(), schema is NOT inherited from the Data Product contract, so merged contracts will only have the entity's schema. The code appears correct, but the function name validateContractWithEffective is slightly misleading since it doesn't use contractForResults.getEntity() for validation - it uses effectiveContract.getEntity().

After more analysis: The code is functionally correct. The effective contract's entity reference is set to the actual asset, so schema validation runs against the right entity. This finding is a false positive.

Performance: Unbounded asset retrieval in processDataProductContracts()

📄 openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/dataContracts/DataContractValidationApp.java:173-174
In processDataProductContracts(), assets are retrieved with a hardcoded limit of 1000:

ResultList<EntityReference> assetsResult =
    dataProductRepository.getDataProductAssets(dataProduct.getId(), 1000, 0);

This does not handle pagination if a Data Product has more than 1000 assets. Assets beyond the first 1000 will be silently skipped during contract validation.

Suggested fix:
Implement pagination to process all assets, similar to the pagination used in validateExistingContracts().

Bug: Type casting circumvents TypeScript type safety

📄 openmetadata-ui/src/main/resources/ui/src/components/DataContract/ContractTermOfService/ContractTermsOfService.component.tsx:53-56
In ContractTermsOfService.component.tsx, the handleContentOnChange function uses unsafe type casting:

const handleContentOnChange = (value: string) => {
  // Always save in the new object format
  onChange({
    termsOfUse: { content: value } as unknown as string,
  });
};

This as unknown as string bypasses TypeScript's type checking completely. If the DataContract type has been updated to expect TermsOfUse object for termsOfUse, the type definition should be updated to reflect this, rather than using dangerous casts.

Impact: This could lead to runtime errors or unexpected behavior if the upstream code expects a different type.

Suggested fix:
Update the DataContract type definition to properly reflect the new termsOfUse structure, or use proper type guards.

...and 15 more from earlier reviews

What Works Well

Clean separation between entity contracts and inherited contracts with proper merge logic. Comprehensive schema migrations for both MySQL and PostgreSQL. Extensive test coverage including 854+ lines of backend tests and 1300+ lines of E2E Playwright tests covering inheritance scenarios.

Tip

Comment Gitar fix CI or enable auto-apply: gitar auto-apply:on

Options

Auto-apply is off → Gitar will not commit updates to this branch.
Display: compact → Showing less information.

Comment with these commands to change:

Auto-apply Compact
gitar auto-apply:on         
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

@sonarqubecloud
Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Ingestion safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Data Contract at Data Product Level

4 participants